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ABSTRACT 

Recently, there has been a great deal of interest in understanding the reionization of hydrogen in the intergalactic 
medium (IGM). One of the major outstanding questions is how this event proceeds on large scales. Motivated 
by numerical simulations, we develop a model for the growth of H II regions during the reionization era. We 
associate ionized regions with large-scale density fluctuations and use the excursion set formalism to model the 
resulting size distribution. We then consider ways in which to characterize the morphology of ionized regions. 
We show how to construct the power spectrum of fluctuations in the neutral hydrogen field. The power spectrum 
contains definite features from the H II regions which should be observable with the next generation of low- 
frequency radio telescopes through surveys of redshifted 21 cm emission from the reionization era. Finally, 
we also consider statistical descriptions beyond the power spectrum and show that our model of reionization 
qualitatively changes the distribution of neutral gas in the IGM. 
Subject headings: cosmology: theory - intergalactic medium - diffuse radiation 
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1. INTRODUCTION 

The reionization of the intergalactic medium (IGM) is one 
of the landmark events in early structure formation. It marks 
the epoch at which radiative feedback from luminous objects 
impacted the farthest reaches of the Universe - the point at 
which structure formation affected every baryon in the Uni- 
verse, albeit indirectly. The timing, duration, and character of 
this event contain an enormous amount of information about 
the first cosmic structures and also have important implica- 
tions for later generations of baryonic objects. For these rea- 
sons, a great deal of attention - both observational and theo- 
retical - has recently been focused on this process. Most sig- 
nificantly, reionization is an important signpost that connects 
several disparate observations. Currently the data provide tan- 
talizing hints about the ionization history of the Universe but 
few definitive answers. Observations of Lya absorption in the 
spectra of high-redshift quasars indicate that reionizati on ends 
at z ^ 6.5 ( Becker etal. 2001; Fan et al. 2002; Whi te et alJ 



I2003t IWvit he & Loeb 2004), although this interpretation is 
controversial (Sonaaila 2004). The main difficulty with these 
measurements is that the Lya optical depth is extremely large 
in a fully neutral medium (Gunn & Peterson 1965), making 
it difficult to place strong constraints when the neutral frac- 
tion exceeds ~ 10"^. On the other hand, measurements of 
the cosmic microwave background (CMB) imply a high opti- 
cal depth to electron scattering, apparently requiring reion- 
ization to begin at z > 14 (Koautetal. 2003; Sperael et al. 
12003). Unfortunately, the CMB data provide only an inte- 
gral constraint on the ionization history. Taken together, these 
observations rule out simple pictures of fast reionization (e.g., 
iBarkana & L oeb"2001 and references therein), but it is not 
yet clear what they do imply about early generations of lumi- 
nous sources (Sokasian et al. 2003a b; Wvithe & Loeb 2003; 
[Cen2(303; Haiman & Holder 2003; .Onken & Miralda-Escude. 
l2003UFnkngita & Kawasakil2003ll. 
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At the same time, our theoretical understanding of how 
reionization proceeds, given some source population, has 
been advancing rapidly. Most models of reionization are 
basedon the growth of H II regions around individual galaxies 
llArons& Wing ert 1972; Barka na & Loebl2001i) . These mod- 
els use semi-analytic techniques to compute the evolution of 
global quantities like the total ionized fraction. However, they 
are unable to describe the morphology of reionization, an is- 
sue that is both theoretically interesting and observationally 
accessible. Morphology is a difficult aspect to address be- 
cause it depends on such factors as the locations of individual 
sources, the dumpiness of the IGM, the underlying density 
distribution (the cosmic web), recombinations, and radiative 
transfer effects. Some semi-analytic models have been de- 
veloped to address these issues. The most popular assumes 
that reionization is controlled by recombinations an d pro- 
ceeds from low to high density regions ( Miralda-Escud e et alJ 
l2000h . but these models are approximate at best. Fortunately, 
it has recently become possible to incorporate radiative trans- 
fer into numerical simulations of the reionization era ( Gnedin] 
.2000; Sokasian et al 2003a b; Ciardi et al. 2003) . at least as a 
post-processing step. Because simulations can include all of 
the above processes (with the possible exception of dumpi- 
ness, which still requires extremely high mass resolution), 
they give a more nuanced view of the reionization process. 
One of the chief lessons of the simulations is that reionization 
is significantly more inhomogeneous than expected. The clas- 
sical picture of a large number of small H II regions surround- 
ing individual galaxies does not match the simulations well; 
instead, a relatively small number of large ionized regions ap- 
pear around clusters of sources (see, for example. Figure 6 of 
l^kasian et al. 2003a). Moreover, in the simulations reioniza- 
tion proceeds from high to low density regions, implying that 
recombinations play only a secondary role in determining the 
morphology of reionization; instead, large-scale bias plays a 
dominant role. This picture suggests a new approach to an- 
alytic models of reionization, one that takes into account the 
large-scale density fluctuations that are ultimately responsible 
for this ionization pattern. We describe such a model in ^ 
We derive the size distribution of H II regions in a way analo- 
gous to the Press & Schechter ( 1974) halo mass function and 
show that it has the quahtative features seen in the Simula- 



2 



tions. 

Observing the morphology of H II regions requires new ob- 
servational techniques as well as robust ways to character- 
ize the data. Among the most exciting approaches to study 
the reionization process are surveys of 21 cm emission from 
neutral hydrogen at high redshifts (Field 1958; Sc ott & ReesI 
[1990; Madau et al. 1997; Zaldarriaaa et al. 2004). The idea 
behind these observations is to map the fluctuating pattern of 
emission (or absorption) from neutral hydrogen in the Uni- 
verse over a range of frequencies. This yields a measurement 
of fluctuations due to both the density field and the H II re- 
gions. Because 21 cm emission comes from a single spec- 
tral line, such observations allow a three-dimensional recon- 
struction of the evolution of neutral hydrogen in the IGM 
and, owing to the design of low-frequency radio telescopes, 
can probe the large cosmological volumes needed to study 
the IGM. Despite numerous technical challenges, instruments 
able to make the necessary observations will be built in the 
coming years. These include the Primeval Structure Tele- 
scope (PAST)* the Low Frequency Array (LOFAR),^ and the 
Square Kilometer Array (SKA).^ The major obstacles to high- 
redshift 21 cm observations are the many bright foreground 
sources, which include Galactic synchrot ron em ission, free- 
free emission from galax ies (Oh & Mac3 l2003h . faint radio- 
loud quasars (Di Matteo et al. 2002), and synchrotron emis- 
sion from low-redshift galaxy clusters ( Di Matteo et al. 2004). 
Fortunately, all of these foregrounds have smooth continuum 
spectra. Zaldarriaga et al. (2004; hereafter ZFH04) showed 
that the foregrounds can be removed to high precision because 
the 21 cm signal itself is uncorrected over relatively small 
frequency ranges. ZFH04 also showed how to compute the 
angular power spectrum of the 21 cm sky as a function of 
frequency, given some model of reionization. This is the sim- 
plest statistical measure of the morphology of H II regions, 
and they showed that it is quite powerful in distinguishing dif- 
ferent stages of reionization (at least in a simple toy model). 

Of course, predictions for the 21 cm signal, or any other 
measurement of reionization, rely on an accurate model 
for reionization. The initial conditions are straightforward: 
when the neutral fraction xh ~ 1, the power spectrum fol- 
lows that of the density (Tozzietal. 2000). This breaks 
down when the H II regions appear, and more sophisti- 
cated approaches become necessary. There have been sev- 
eral recent attempts to predict the signal using numerical 
simulations ("Ciardi & Madau' l2003t iFurlanetto et"al] l2004at 
[Onedin & Shaver 2003). However, the boxes in these simula- 
tions have sizes < 20 Mpc, subtending only a few arcminutes 
on the sky. Because the angular resolution of low frequency 
radio telescopes at the required sensitivity is also a few ar- 
cminutes (ZFH04), such predictions require extrapolations to 
larger scales, which may be dangerous given the already large 
sizes of ionized regions. With our analytic model for the size 
distribution of these bubbles, we are able to make the first de- 
tailed statistical characterizations of the fluctuating ionization 
pattern on the large scales most relevant to observations. We 
begin with the power spectrum as the simplest description. In 
|3l we show how to compute the power spectrum for an ar- 
bitrary size distribution, and in ^we apply that formalism to 
our model of H II regions. 

The power spectrum is only one way to describe the 21 

* See'httpV/astrophysics. phys.cmu.edu/~jbp for details on PAST. 

^ See http://www.lofar.org for details on LOFAR. 

' See http://www.skatelescope.org for details on the SKA. 



cm field. It is an excellent approximation if fluctuations 
in the matter density dominate the signal, because the den- 
sity field is neary gaussian on the large scales of interest 
here. Unfortunately, once the H II regions dominate the power 
spectrum, th e probability distribution is no longer gaussian 
dMorales & Hewi tt 2003). There have, however, been no at- 
tempts to describe the distribution. Using our model for reion- 
ization, we discuss some of the relevant features in |5l We 
show that the reionization model qualitatively changes the 
character of fluctuations in the neutral gas density. 

This paper is primarily concerned with developing a gen- 
eral and useful model for the morphology of reionization and 
with the crucial features of that model. We will focus here 
on the physics of reionization rather than on their observ- 
able consequences; we consider some of these in a compan- 
ion paper about the 2 1 cm signal expected from high redshifts 
(Furlanetto et al. 2004b). 

In our numerical calculations, we assume a cosmology with 
n,„ = 0.3, r^A = 0.7, = 0.046, H = lOO/i km s"' Mpc"' (with 
h = 0.7), n = 1, and = 0.9, consistent with the most recent 
measurements (Spergel e t al..2003) . 

2. H II REGIONS DURING REIONIZATION 

In this section, we will calculate the size distribution of ion- 
ized regions within the IGM. So long as ultraviolet photons 
are responsible for reionization, it is a good approximation to 
divide the IGM into a fully neutral component inside of which 
sit discrete, fully ionized bubbles. We will make this approxi- 
m ation throughout ; we con sider some more complicated cases 
in Furlanetto et al. ( 2004b). A complete description requires 
accurately locating the ionizing sources, fully resolving the 
dumpiness of the IGM, following recombinations, and per- 
forming radiative transfer. These problems all require numer- 
ical simulations (e.g., Gnedin 2000; Sokasia n et alJl2003alli : 
"Ciardi et al."'2003 and references therein). As a result, semi- 
analytic models have avoided this question and focused on 
global quantities such as the mean neutral fraction xh- The 
simplest semi-analytic model, and the usual assumption in 
the literature, is to associate each ionized regio n with a sin- 
gle galaxy (e.g., Barkana "200^;' Loebeta 1J2004Y In this case 
the size distribution dn/dm (where m is the mass of the ion- 
ized region) follows directly from the halo mass function if 
we make the simple ansatz that 

OTion = C"%al, (1) 

where Wgai is the mass in a collapsed object and ( is some 
efficiency factor. It could, for example, be written C = 
fescf*N^/bf^7ec' 'wi'^h /esc the escape fraction of ionizing pho- 
tons from the object, /t the star formation efficiency, Nj/b the 
number of ionizing photons produced per baryon in stars, and 
Hi-ec the typical number of times a hydrogen atom has recom- 
bined. These parameters all depend on the uncertain source 
properties and can be functions of time; we will consider sev- 
eral possible values for ^ below. 

Because the mass function is steep at high redshifts, the 
resulting H II regions are quite small (see the discussion of 
Figure |3 below). This conflicts with even the most basic 
pictur es from simulations ( Sokasian et al._2003 a: .Ciardi et aP 
I2003h . "Typical" ionized regions in simulations extend to 
several comoving Mpc in radius even early in overlap, many 
times larger than Stromgren spheres around individual galax- 
ies. The reason is simply that the Stromgren spheres of nearby 
protogalaxies add, so that biased regions tend to host sur- 
prisingly large ionized regions. For example. Figure 6 of 
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ISokasian et alJ (|20033) shows that H II regions tend to grow 
around the largest clusters of sources, in this case primarily 
along filaments. In fact, the radius of the H II regions quickly 
exceeds the correlation length of galaxies, so it is difficult to 
see how to construct a model for the bubbles based on "local" 
galaxy properties. 

Therefore, in order to describe the neutral fraction field, xh, 
we need to take into account large-scale fluctuations in the 
density field. Here we describe a simple way to do so. We 
again begin with the ansatz of equation Q and ask whether 
an isolated region of mass m is fully ionized or not. Because 
it is isolated, the region must contain enough mass in lumi- 
nous sources to ionize all of its hydrogen atoms; thus we can 
impose a condition on the collapse fraction: 



/coU >fx = C 



(2) 



In the extended Press-Schechter model jBond et aUUOQll 
iLacev & Co le 1993), the collapse fraction is a deterministic 
function of the mean linear overdensity 5,,, of our region: 



/coll = erfc 



(3) 



where cr^(m) is the variance of density fluctuations on the 
scale m, a^^^^^ = cr^(m,nin), Sdz) is the critical density for col- 
lapse, and mmin is the minimum mass of an ionizing source^. 
Unless otherwise specified, we will take mmin to be the mass 
corresponding to a virial temperature of lO'* K, at which 
atomic hydrogen line cooling becomes efficient. Note that this 
expression assumes that the mass fluctuations are gaussian on 
the scale m; the formula thus begins to break down when we 
consider mass scales close to the typical size of collapsed ob- 
jects. Armed with this result, we can rewrite condition Q as 
a constraint on the density: 



S„, > SAm,z) = 6Az)-V2K(0[aii„-aHm)] 



1/2 



(4) 



where KiQ = erF'(l - C"'). We see that regions with suffi- 
ciently large overdensities will be able to "self-ionize." 

In order to compute the size distribution of ionized regions 
we must overcome two additional, but related, difficulties. 
First, we apparently must settle on an appropriate smoothing 
scale m. Second, we must take into account ionizing pho- 
tons from galaxies outside of the region under consideration. 
In other words, an underdense void nii may be ionized by a 
neighboring cluster of sources in an overdense region m2 pro- 
vided that the cluster has enough "extra" ionizing photons. 
But notice that we can solve the latter problem by changing 
our smoothing scale to m\+m2: then the net collapse fraction 
in this region would be large enough to "self-ionize." 

This suggests that we wish to assign a point in space to 
an ionized region of mass m if and only if the scale m is the 
largest scale for which condition (0} is fulfilled. If this proce- 
dure can be done self-consistently, we will not need to arbi- 
trarily choose a smoothing scale. Our problem is analogous to 
constructin g the halo mass fu nction through the excursion set 
formalism ( Bond et al. 199?): starting at m = oo, we move to 
smaller scales surrounding the point of interest and compute 
the smoothed density field as we go along. Once 5„, = 6x(m,z), 

' Note that in equation |3) tlie growth of structure is encoded in the time 
evolution of Sdz), with a^{m) constant in time. We adopt this convention in 
the rest of the paper 




Fig. 1. — The density threshold 5x(o-^,z) at several different redshifts, 
assuming = 40. The curves are for z = 20, 16, and 12, from top to bottom. 
Within each set, the solid curve is the true Sjt(m,z) and the dashed line is the 
fit B{m,z). 



we have identified a region with enough sources to ionize it- 
self, and we assign these points to objects of the appropriate 
mass. To obtain the mass function, we need to find the dis- 
tribution of first up-crossings above the curve described by 
(5^. (We are concerned only with the first-crossing distribution 
because those trajectories that later wander below the barrier 
correspond to regions ionized by sources in neighboring vol- 
umes.) Again, we need not choose a smoothing scale; each 
point is assigned to an object of mass m based on its own 
behavior 

The solid lines in Figure^show the barrier Sxiin,z) for sev- 
eral redshifts as a function of a^(m). In each case the curves 
end at o'^(COTmin); this is the minimum size of an H II region in 
our formalism. The Figure shows an important difference be- 
tween our problem and the excursion set formalism applied to 
the halo mass function. In the latter case, the barrier Sdz) is in- 
dependent of mass. Clearly this would not be a good approxi- 
mation in our case. Unfortunately, there is no general method 
for constructing the first-crossing distribution above a barrier 
of arbitrary shape (but see Sheth & Tormen [2002] for an ap- 
proximate method). The most complicated case for which a n 
analytic solution is available is a linear barrier jShethin 998ft . 
The dashed curves in Figure^show linear "fits" to the barrier 
constructed in the following way. First note that, as m oo. 



S,^Bo = 5Az)-V2K(Oa„ 
Also, at any given c^, the slope is simply 



2(a 



■2 

min 



(5) 



(6) 



We define Bi to be this slope evaluated at a = 0. The dashed 
lines in Figure^are B(m,z) = Bo + Bia^(m), i.e. a linear fit 
to the true barrier at m = oo. We see that this is a reasonable 
approximation to the true barrier shape for that are not too 
large. The fit departs from as the mass approaches the size 
of H II regions around individual galaxies. However, equa- 
tion ^ - upon which the entire approach is predicated - also 
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Fig. 2.— The bubble size distribution Qr^Vdn/dXaR at several different FiG- 3.— The global ionization history for several scenarios. The curves 

redshifts in our model, assuming C = 40 (note that R is the comoving size). that rise from zero assume C = 500, 40, and 12, from right to left; solid lines 

Dot-dashed, short-dashed, long-dashed, dotted, and solid lines are for z= are f™' ™<"iel and dashed lines are the "true" values, C,Uo\\- 
18, 16, 14, 13, and 12, respectively. These have Q = 0.037,0.11,0.3,0.5, and 
0.74. 



breaks down on small mass scales. Thus we do not consider it 
necessary to improve the fit. (In any case, changing the slope 
of the barrier to fit cr^(C'«min) exactly does not significantly 
change our results except at early times, when the bubbles are 
still quite rare.) 

The advantage of a linear fit is that we can now write the 
mass function analytically (Sheth 1998): 



dn 
dm 



2 p 



d\n(j 



d\nm 



Bo 
a(m) 



exp 



(7) 



This is the comoving number density of H II regions with 
masses in the range (m,m + dm). Figure |2l shows the result- 
ing size distributions at several redshifts for C, = 40. The dot- 
dashed, short-dashed, long-dashed, dotted, and solid curves 
correspond to z = 18, 16, 14, 13, and 12, respectively. The 
curves begin at the radius corresponding to an H II region 
around a galaxy of mass m^nin- We have normalized each curve 
by the fraction of space Q filled by the bubbles. 



Q = 



, dn 

dm V(m), 

dm 



(8) 



where V(m) is the comoving volume of a bubble of mass 
m. We show the evolution of Q for several choices of ( 
by the solid lines in_Figure|3j the curves in Figure |2] range 
from Q = 0.037 to Q = 0.74. When the ionized fraction is 
small, the ionized regions are also small, with characteris- 
tic sizes < 0.5 Mpc. At this point they are not much bigger 
than the Stromgren spheres surrounding individual galaxies. 
However, the size increases rapidly as the neutral fraction de- 
creases; when Q = 0.5, the bubbles are already several Mpc 
in size. The characteristic scale then begins to increase ex- 
tremely rapidly because Bq — > as we approach overlap (see 
Figure This behavior matches the results of the simula- 
tions cited above - although note that the scales we find can 
exceed the simulation box sizes well before overlap (see be- 
low). 

This contrasts sharply with a scenario in which we assign 
ionized regions to individual galaxies: the top panel of Figure 



13 compares the predictions of our model with one in which 
each galaxy hosts its own distinct ionized bubble. Note that 
we have not normalized the curves by Q. In the galaxy-based 
model, we see that the bubble sizes change only very slowly; 
the filling factor is dominated by the smallest galaxies. In 
such a scenario, overlap is achieved not by the growth of 
existing H II regions but through the formation of more dis- 
tinct bubbles. Also note that, provided our model is correct, 
the sizes of ionized regions cannot be determined even from 
the Stromgren spheres around "large" or galaxies. Instead 
large-scale fluctuations are required in order to understand the 
bubble pattern. The disparity between the models clearly in- 
creases rapidly as we approach overlap, but it is significant 
even at early times. 

The mass function of ionized regions has qualitatively dif- 
ferent behavior from the usual halo mass function. The most 
significant difference is that it has both low and high mass 
cutoffs. This occurs because the barrier rises more steeply 
than aim), an d it becomes harde r for a trajectory to cross it at 
small masses JSheth & Tormeri2002) .^ As a result, the mass 
function has a characteristic scale that becomes sharper as Q 
increases. In general, our model is good news for attempts 
to constrain the H II regions observationally. It predicts large 
features with a characteristic scale, which should make the 
ionized bubbles easier to identify. 

Although we have argued that the general properties of our 
model reproduce the qualitative features found in simulations, 
we have not performed a quantitative comparison. This will 
clearly require a large simulation box in order to reproduce 
the size distribution we expect. In our model, the bubble 
sizes are essentially set by la rge-scale fluctuations of the den- 
sity field; as emphasized by ' Barkana & Loebl i2003l) . the fi- 
nite simulation size will bias these fluctuations and reduce the 
characteristic sizes. A numerical simulation typically forces 
(5 = on the size scale of the box, Lbox- We can calculate 
the expected size distribution in a box of the appropriate size 

* Note that the real barrier rises more quickly than our linear approxima- 
tion. This will make the low mass cutoff occur slightly earlier, increasing the 
sharpness of the peak in the size distribution. 
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Fig. 4. — The bubble size distribution Vdn/dlnR at several different 
redshifts, assuming ^ = 40. In both panels, the thick lines show our model 
with dot-dashed, short-dashed, long-dashed, and solid lines corresponding to 
z = 18, 16, 14, and 13, respectively. In the top panel, the thin lines assume 
individual galaxies source distinct H II regions. In the bottom panel, the thin 
lines use equation |9j, with Ltox = 10/i"' Mpc = 14.3 Mpc. 



by changing the origin of the excursion set formahsm from 
(a^ = 0, (5 = 0) to (tJ^ = Cbox' = 0). The resuhs are exactly 
analogous to the extended Press-Schechter formalism for cal- 
culatrng conditional ha lo mass functions (Lacev & Qolg 1993j 
ISheth & TormerJ2002l) : the size distribution is 



dn 
dm 



box 



2 I 
TT m 

xfioexp 



d\na 



d\nm 



2[aHm)-aU 



(9) 



The results are compared to the "true" size distribution in 
the bottom panel of Figure |3 assuming Lbox = 10/i~' Mpc = 
14.3 Mpc; this is the same size as the simulations discussed 
bv lSokasian et a l. (2003a), which are among the largest vol- 
umes that have been used in numerical studies of reioniza- 
tion. We see that even state of the art simulations probably 
underestimate the sizes of ionized regions, especially during 
the late stages of overlap. Studying these epochs will require 
box sizes > 100 Mpc for a fair sample. 

Before proceeding to discuss the statistical properties of 
this distribution in more detail, we pause to note several 
caveats about our model. First, we do not necessarily have 
Q = C/coih and hence our normalization may be incorrect. One 
reason is that equation (|3jl breaks down on sufficiently small 
scales. Also, we use the linear fit B{m,z) rather than the true 
6x{in,z)- We therefore expect that our model will not properly 
normalize the size distribution when the characteristic bub- 
ble size is small. Figure |3l indicates that this is indeed the 
case; the dashed curves (which give the true ionized fraction) 
lie significantly above the solid lines of our model at early 
times (when the bubbles are small). To solve this problem, 
we multiply our number densities by C/coU / Q- Fortunately, 
the figure also shows that this correction becomes negligible 
when 2 > 0.1. Thus the correction is only necessary when 
the bubbles have a small effect anyway. The required renor- 
malization does increase if m^in increases (because equation 



l3l breaks down sooner) but is never more than ^ 20% in the 
results we show. 

Another caveat is that the excursion set formalism is fun- 
damentally Lagrangian: it follows mass elements rather than 
volume elements. More massive regions will have expanded 
less than their underdense neighbors. We do not include this 
difference in calculating the radial sizes of the regions; this is 
reasonable because regions that cross our barriers are still far 
from turnaround at these early times. Moreover, we of course 
assume spherical H II regions: in reality they have much more 
complex shapes. 

Finally, we emphasize several simplifications we have made 
to the physics of reionization. First, equation Q neglects the 
environmental dependence of the ionizing efficiency. The re- 
combination rate, for example, increases with the mean local 
density, making Q a function of the density. It is possible to 
formulate a model of reionizatio n based on the variation of the 
recombination rate with density jMiralda-Escude eFa 1I2OOQ1) : 
in such a model, low density voids are ionized first. However, 
the simulations show instead that the densest regions tend to 
be ionized first, at least if the sources are relatively faint and 
numerous (Sokasian et al. 2003a; Ciardi et al. 2003), in ac- 
cord with our model. The basic reason is that high-redshift 
galaxies are strongly biased, so their abundance increases 
faster than the local density. Thus H II regions first appear 
in the densest regions and only then escape into the voids. 
Because subsequent generations of sources also form primar- 
ily in the dense regions, this structure is preserved throughout 
the ionization process. We compare our model to one in which 
voids are ionized first in'Furlanetto et al.' ('2004b). 

Recombinations will nevertheless be important in the de- 
tailed ionization pattern inside of each ionized region. Our 
model assumes that such regions are fully ionized; in real- 
ity, they will have a complex ionization pattern determined by 
recombinations in clumpy regions, "shadowing" of ionizing 
sources by high-column density objects, and the sheets and 
filaments of the cosmic web. In what follows, we ignore these 
complications, because they will imprint signatures into e.g. 
the angular power spectrum on scales much smaller than those 
characteristic of the bubbles as a whole. 

3. CONSTRUCTING THE POWER SPECTRUM 

We will now consider how to construct observable statisti- 
cal measures of the size distribution of H II regions. If high 
signal-to-noise measurements are possible, then of course 
dn I dm of the ionized regions can be measured directly. We 
will instead consider regimes in which the H II regions cannot 
be mapped in detail but in which the fluctuation pattern can 
still be measured statistically. We will take as an example the 
21 cm power spectrum (ZFH04). The fundamental observ- 
able in 2 1 cm measurements is the brigh tness temperature of 
the ir.M dFiel JlQ.'jqUMadau et allToOTl). 

(^7i « 23 V' X 



- TCMB \ 



0.02 



0.15 



1 + z 
10 



1 1/2 



mK. 



(10) 



Here, is the hydrogen spin temperature and ip = xh(1 + S), 
where xh is the local neutral fraction and 5 is the local over- 
density; the other symbols have their usual meanings. We will 
assume throughout that Ts ^ Tcmb, so that STh is independent 
of the spin temperature; this is expected to be a good assump- 
tion soon after ionizing sources turn on (see ZFH04 for a more 
detailed discussion). 
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The simplest statistical description of the 21 cm field is pro- 
vided by the power spectrum of ?/;. In this section we will de- 
scribe how to construct this function. We will find it simpler 
to work in terms of the correlation function which can be 
constructed explicitly from its component fields (ZFH04): 



(11) 



Here is the correlation function of xh, ^ss is the correlation 
function of the density field, and £^xS is the cross-correlation 
between the two fields. The power spectrum is then the 
Fourier transform of this quantity. First, note that ^ss can be 
obtained directly using standard techniques such as the halo 
model (e.g.. lCoorav & Sh eth 2002). This includes nonlinear 
corrections to the power spectrum , although these are gener- 
ally not important on the scales of interest. Because the Xfj 
field is actually composed of isolated ionized regions, we find 
it more intuitive to work in terms of the correlation function 
of the ionized fraction, x,. However, note that by linearity the 
two correlation functions are identical. In the following sec- 
tions we describe how to build our model for We first de- 
scribe the necessary limits of the correlation function in ^3.11 
and explain why the simplest models fail. We go on to show 
how to c ompu te for a randomly distributed set of H I I re- 
gions in ^3.21 and then how t o ad d source clustering in ^3.31 
Finally, we compute ^y-s in ^3.41 Note that we will neglect 
redshift space distortions and Jeans smoothing of the baryonic 
fluctuations. 



3.1. Restrictions on 

We expect the joint probabiUty distribution to have the form 
(ZFH04) 

{xiX2) = {xi(ri)xi(r2))=xj + (xi-xj)f(ri2/R), (12) 

with ri2 = |ri -r2| and R some characteristic bubble size. The 
function / has the limits / — > for r\2^ R and / ^ 1 as 
r\2 0. That is, if two points are separated by a distance 
much smaller than the size of a typical H II region they will 
either both be ionized by the same bubble, with probability x,, 
or both neutral. But if they are well-separated relative to the 
bubble size they must reside in distinct H II regions, and the 
probability approaches xj. The correlation function is then 

ixx= {xiX2)-xj. 

In addition, we must note that x, can take values only in a 
restricted range from zero to unity. This means that x,- = 1 im- 
pUes Xi = 1 everywhere, so that the correlations vanish. Thus, 
we need = when both x, = and x, = 1 . We note that 
this limit is not satisfied by many simple models. For ex- 
ample, Santos et al. (2003) argue that xj^ss times some 
filter function that damps fluctuations inside the bubbl es; this 
form d oes not have the correct limiting behavior. Sant os et al.l 
i2003h instead enforce the limit by choosing the bubble size 
to approach infinity at overlap (however, it does not have the 
correct behavior on large scales when x, < 1). We require 
a prescription that self-consistently satisfies the above limit, 
independent of our prescription for bubble sizes. For exam- 
ple, we wish to contrast the model described in ^ in which 
the individual bubble sizes approach infinity at overlap, with 
other models in which overlap is achieved by increasing the 
number density of bubbles but not their sizes (such as placing 
H II regions around each galaxy; see Figure|3. 



3.2. A Model for 

Assuming that the sources are uncorrected, the probability 
of having sources in a volume V follows a Poisson distribu- 
tion. Thus the probability that at least one source falls into this 
region is 1 - e~"^, where n is the number density of sources. 
We begin by assuming that all the bubbles have the same size 
V/. The proper definition of the ionized fraction is 



X/ = 1 - 



-nV, _ 



= l-e- 



(13) 



We let Vo(r) be the volume of the overlap region between two 
ionized regions centered a distance r apart. The probability 
that both ri and r2 are ionized has two parts. One possibility 
is that a single source can ionize both points if it is inside 
Voirn)- Alternatively, two separate sources can ionize the two 
points. The total joint probability is then 



(X1X2) = (1- 



-nV„ 



) + e-"^"[l 



-n(V,-V,h2 



(14) 



where ¥„ = Voirn). One can easily ver ify t hat the formula 
obeys all of the restrictions described in ^3.11 

We now wish to relax the assumption of a single bubble 
size; we will index the bubble size with m for reasons that 
are probably obvious. We first calculate the probability that 
a given point is ionized. Around that point, there is a sphere 
of radius R{mi) where even the smallest allowed source (with 
index m\) is able to ionize the point. Let Pi be the probabil- 
ity that this occurs. If that does not happen, there is a shell 
surrounding this sphere in which a source of radius R(m2) can 
ionize our point; let P2 be the probability of there being a 
source of large enough size in this region. If this does not hap- 
pen, there is another shell in which a source of radius Rimj,) 
can ionize the point, etc. We can write the total probability x,- 
that the point is ionized as 

Xi = Pi + {l-Pi)P2 + (l-Pi)(l-P2)P3 + ... 



■■ 1 - exp 



: 1 - exp 



^ln(l-P,„) 
^^n(m)V(m) 



(15) 



where n(m) is the number density of sources with index m. 

The correlation function can be calculated in the same way 
as above. One possibility is that the two points are ionized by 
the same source; this requires at least one source of size m to 
be in the appropriate overlap region Voirn). The probability 
for this to occur is 



Pi\ source) = 1 - exp 



nim)Voim) 



(16) 



The other possibility is that the points are ionized by different 
sources. Again there is a succession of regions, this time of 
size Vim)-V„im), where a source of size m can ionize one 
point but not the other. Thus [c.f. equation (I14H 



{X1X2) 



1 - exp 
+ exp 
X < 1 - exp 



dn 

dm — — Voim) 
dm 



dn 

dm —— Voim) 
dm 



dm ^ [Vim) -Voim)] 
dm 



(17) 
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where we have taken the continuum Hmit for m and now 
make the explicit connection with the bubble mass. For clar- 
ity of presentation, we have suppressed the rn dependence of 
Vo(ni). This function can be written explicitly as 

_ / 4TrR\m)/3-Trri2[RHm)-rj2/l2], rn < 2R(m) 



Vo(m, ri2) = 



0, 



rn > 2R{m). 
(18) 

Here R{m) is the radius of an ionized region of mass m. 

Equation Ml\ manifestly obeys all of the limits that we ex- 
pect, and in particular ^„ is identically zero when x, = 1 . How- 
ever, it does not treat the overlap of H II regions correctly. We 
expect that when two ionized regions overlap, they expand so 
as to conserve volume (because the number of ionizing pho- 
tons has not changed). The above prescription does not con- 
serve volume. To solve this problem, we multiply the number 
density by [-ln(l-0/g]. We thus force x/ = C/coii- Note 
that, provided we assign the bubble size distribution in such a 
manner that already includes overlap - such as that of equa- 
tion Q - we would ideally like to avoid any overlap when 
constructing the power spectrum. There is no clean way to 
avoid this problem, however, and our solution is a reasonable 
one. 

3.3. Clustering 



The treatment of ^3. 21 assumes that the ionized regions are 
uncorrected on large scales. We can see this in equation Ml\ 
by noting that, when Vo{m) w 0, {x\X2) xj. Thus the corre- 
lation function vanishes on scales larger than the bubble size. 
We now show how to include (approximately) the correlations 
between the bubbles induced by density modulations. Unfor- 
tunately, the precise clustering properties are difficult to deter- 
mine because we only compute the probability that a point is 
ionized by any source; the root of the difficulty is that multiple 
sources can ionize a single point in our development, and it is 
not clear how to treat this possibility. However, the following 
simplification suffices for our purposes. We replace one of the 
factors in the last term of equation (I17> by 



1 -exp 



dm^[V-Vo][l+^(rn,m)] 
am 



(19) 



Here, £,(ri2,m) is the excess probability that the point r2 is 
ionized by a source of size m given that ri is ionized. Thus ^ 
is implicitly an average over all the sources able to ionize the 
first point. It is also approximate in that we have neglected 
the variation of the correlation function across V(m). We are 
thus assuming that such variation is small, or that r^ R. For- 
tunately, this is precisely the regime we are interested in: for 
small r, [V{m)-Voim, r)] and the correlations are strongly 
suppressed anyway. This is a manifestation of the fact that we 
care only if a point is ionized at all, not whether it is ionized 
by a single or many sources. 

We now assume that the correlation function can be written 

^(r,m) = bb(m)^ss(r), (20) 

where £,55(r) is the correlation function of the dark matter, 
b{m) is the linear bias of a source of mass m, and 



b = Q- 



dn 

dm bimyVim) — — 
dm 



(21) 



is the bias averaged over the bubble filling factor. The weight- 
ing by V enters because the probability that our point is ion- 
ized by an obj ect of size m is proportio nal to the object's vol- 
ume. Finally, ISheth & TormenI J2002h have shown that the 



bias of halos whose mass function is fixed by a linear barrier 
is 



b(m,z) = 1 + 



a^{m)B{m,z) 



(22) 



For Xi < 0.5, the bias factor is of order unity. However, it can 
become quite large near overlap. This is because the charac- 
teristic bubble size is fixed hy B ^ Bq ^ a; thus the second 
term is ^ l/B{m,z) which is large near overlap. (At over- 
lap, B « so that the entire universe is ionized.) However, 
the correlations between bubbles in this regime turn out not 
to be significant. The typical correlation length at high red- 
shifts is < 1 Mpc; once the characteristic bubble size exceeds 
this level the corr elations can be neglected on the scales of 
interest (see ^4.1l below). 

3.4. Cross-Correlation with Density 

The final step is to compute the cross-correlation between 
the density and ionization fields. If we use the halo model for 
the density field, this is straightforward: 

{xi(ri)5(r2))=-Xi + j dmu^nhimh) J d^rhu(r2-rh\mh) 



1 - exp 



dm,. 



dn 
dm. 



d^r,(l + ^,„^^,„,Xrh-r,)) 



V(mj,ri) 



(23) 



where nh{mh) is the halo mass function, ^m,,m,, is the excess 
probability of having a bubble of size nis given a halo of mass 
nth at the specified separation, and M(x|m/,) is the normalized 
halo mass profile: J c/-'xM(x|m/,) = 1 (Coorav & Sheth 2002). 

As written, this is a quadruple integral that is difficult 
to evaluate numerically. Fortunately, we can make several 
straightforward simplifications. First, we again write the cor- 
relation function as a multiple of the dark matter correlation: 
£m ,.m,. ~ b(ms)bjj(mj, )£,ss, where bh{mh) is the usual halo bias 
|Mo & White 1996). Second, we note that halos at high z 
are much smaller than both the bubble size and the separa- 
tions that we are interested in. Thus we can set Th « r2 in 
the exponential. The integration over the mass profile is then 
separable (and equal to unity). Finally, we note that with this 
approximation the average of the correlation coefficient (i.e., 
the integral over r j is also separable. We define 



(0 



dn 

dm — — b{m) 
dm 



d^r,£,ss(\r,-r2\), 



(24) 



V{m,Ti) 



where the integral over t/^r, is centered on the point ri. The 
limiting behavior of (^) is easy to understand. If the sepa- 
ration is much smaller than the characteristic bubble size R, 
the integral will be dominated by sources at typical separa- 
tions R and (^) bQS,ss{R)\ if on the other hand r > 7?, the 
integraljs dominated by sources at separations r « ri2 and 

With these two simpHfications, we can rewrite equation 
O as 

(x, 52) = il -Xi) !^1-Jdmh^ «,(m,)e-*"(""'H«> | . (25) 

The limiting behavior of the cross-correlation is also easy to 
compute. We find that on large scales 

(x,(ri)5(r2)) w (1 -xdbhbQCssim), rn > R (26) 
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Fig. 5. — The correlation functions at = 15 (top panel) and z = 13 (bottom 
panel), assuming that ^ = 40. In each panel, the solid, dashed, dotted, and 
dot-dashed curves show ^vj, ^SS^ \Lxs\< respectively. (Note that the 
xh and density fields are anticorrelated, so t;^g < 0.) 
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Fig. 6. — The power spectra at z = 15 (top panel) and z = 13 (bottom panel), 
assuming that ( = 40. In each panel, the solid, dashed, dotted, and dot-dashed 
curves show P^, Pxx, xjjPgg, and \Pt5\, respectively. (Note that the xu and 
density fields are anticorrelated, so Pyg < 0.) 



where bh is the mean halo bias (appropriately weighted) and 
R is the typical bubble size. We see that x, and the density are 
positively correlated on large scales, with a constant correla- 
tion coefficient. This happens because we associate ionized 
regions with overdensities. On small scales, 

(x,(ri)(5(r2))«(l-l,)[l-e-'"'''2«^^'^'], rn<t:R. (27) 

Thus for small separations the two fields are essentially un- 
correlated so long as ^ss(R) is not large; this is because the 
entire bubble is ionized in our model, regardless of its internal 
density structure. 

Finally, note that in this section we have computed the 
cross-correlation between the ionized fraction and the density 
field. However, in equation il 1> we want the cross-correlation 
between the neutral fraction and the density field. This is ob- 
viously = -(x,(ri)(5(r2)), so xh and the density are anticor- 
related. 

4. RESULTS 
4.1. The Correlation Function 

We begin by describing the salient features of the correla- 
tion function, Figure |5] shows example correlation func- 
tions of density, xh, and at z = 15 (top panel) and z = 13 
(bottom panel). Both assume Q = 40, which yields xh = 0.81 
and 0.52 at these two redshifts. We also show the absolute 
value of the cross-correlation ^i^-. First, we see that as r 
decreases from infinity, first becomes shallow and then 
steepens again at small separations. The steepening occurs 
when the one-halo term becomes important; generally, this is 
on scales smaller than we will be able to probe with the next 
set of low-frequency radio telescopes.'' 

We can also see that ^xx has the form expected. At small 
separations, the correlation is dominated by whether the two 
points are in a single bubble or not; in this regime ^^x — > 

' The halo model correlation function in this regime also depends on the 
choice of halo ma ss function; we use the Press & Schechter 1 1974) mass 
function. The lSheth & TormenI mass function changes the one-halo 

term by < 20% here. 



xh{1 -Xh). At large separations, the points must each be in- 
side different bubbles, so ^„ 0; actually the clustering term 
begins to dominate on sufficiently large scales and « 
(this is just visible in the top panel). The transition between 
these two regimes occurs sharply, on scales that are compa- 
rable to the typical size of H II regions. In this intermediate 
regime we can have ^ ^55, provided that the bubbles are 
sufficiently common and that the typical bubble size is signif- 
icantly larger than the correlation length of the density field. 
(If not, ^„ simply traces the correlation function of the den- 
sity field as it falls.) The cross-correlation has similar behav- 
ior, approaching a constant value at small separations and a 
constant multiple of ^xi on large scales. Note that the cross- 
correlation is negligible on small scales, essentially because 
our bubbles have no internal structure, but it can be important 
on large scales. 

The solid lines in Figure |5] are i^^. On sufficiently small 
scales the dominant term in equation ( II U is S^xx^ss^ and the 
correlation function is simply proportional to the one-halo 
term of the density field. On extremely large scales, falls 
below ^55 because of the anti-correlation between density and 
Xh- On intermediate scales, the ionized regions add only a 
small perturbation to when the neutral fraction is large, but 
dominate by a large factor in the later stages of overlap. This 
"hump" is the observational signature that we seek. 

4.2. The Power Spectrum 

Figure |6l shows the power spectra of the correlation func- 
tions in Figure |5l Here A^(A;) = (k^ /2TT^)P(k) is the dimen- 
sionless power spectrum; we show the square root of this 
quantity because in 21 cm studies ST oc t/j rather than 7/)^. 
The behavior is essentially as expected from the discussion 
of the correlation functions above. When the neutral fraction 
is large, traces the density power spectrum, with the ion- 
ized regions adding only a small perturbation on the relevant 
scales. On large scales, also traces the density power spec- 
trum, but < Pss because the bubbles are anti-correlated 
with density. Once overlap has become significant, P„ dom- 
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Fig. 7. — The redshift evolution of in the C = 12 (thin lines) and f = 40 
(thick lines) models. The curves are: xh = 0.96 (dotted), xu = 0.8 (short- 
dashed), iCH = 0.5 (long-dashed), and xh = 0.26 (solid). The redshifts in the 
two models differ 




1 



Fig. 8. — The angular power spectrum of brightness fluctuations in the 
= 40 model. The thick curves are for: xh = 0.96 (dotted), xh = 0.8 (short- 
dashed), Xh = 0.5 (long-dashed), and xh = 0.26 (solid). The thin long-dashed 
curve is for the model where galaxies host their own individual HIT regions 
(XH = 0.5). 



inates the power spectrum. In the bottom panels, the bubbles 
increase the power on and above their characteristic scale by 
nearly an order of magnitude. On large scales, our model pre- 
dicts that stops tracing Pgs when xh is small. This is be- 
cause the characteristic bubble size approaches infinity near 
overlap. The number density of the bubbles thus decreases 
rapidly when xh is small, so the "shot noise" becomes sig- 
nificant and dominates the large scale power. In this regime, 
cx k^, which is approximately satisfied at < 0.1 Mpc"' 
in the bottom panel of Figure|5] 

This Poisson noise component is to some extent an artifact 
of the simplifying assumptions inherent to our model. We 
assumed that each point is either fully ionized or fully neu- 
tral, neglecting effects like recombination, dumpiness, shad- 
owing, asphericity, etc. Once the H II regions become suffi- 
ciently large, these effects will determine the internal structure 
of the bubbles. By effectively decreasing the size and increas- 
ing number density of ionized regions, they would therefore 
reduce the power on large scales, bringing closer to the 
density power spectrum. Including these detailed effects will 
require numerical simulations. 

We note that the power spectra shown here differ from those 
of ZFH04 in that the "bubble" feature is significantly less pro- 
nounced. This is simply because that paper employed a "toy" 
model in which the bubbles had a single effective size at each 
redshift. Also, our models have relatively more power on 
large scales late in reionization, because the ZFH04 analy- 
sis reduced the bubble size at these stages while keeping the 
number density fixed. Because the shot noise decreases with 
the number density, the large scale power was small in that 
case. 

4.3. Reionization Histories 

We now consider how the power spectrum evolves with 
time for some simple reionization histories. We will examine 
only single reionization episodes here, although our formal- 
ism can accommodate other, m ore complicated histories as 
well fsee lFurlanetto et alJ2004bl) . For reference, we note that 



a "typical" Population II star formation history, with /esc = 0.2, 
= 0.05, N^ih = 3200, and Wrec = 3, would ha ve C ^ 12. This 
gives a reionization history similar to that of Sokasian et alj 
(^2003a), who adopted a Populatio n II star formation rate 
based on numeri cal simulations dHernauist & Spri ngel 200^ 
ISpringel & Hernauist 200 3). Population III stars can have C, 
an order of magnitude larger without much difficulty. The 
evolution of xh for some sample histories is illustrated in Fig- 
urep] 

Figure shows the evolution of the power spectrum for 
C = 40 (thick lines) and C = 12 (thin fines). The dot- 
ted, short-dashed, long-dashed, and solid curves have xh = 
0.96,0.8,0.5, and 0.25, respectively. These correspond to 
z ~ 18, 15, 13, and 12 (for ( = 40) and z ~ 16, 12, 10, and 9 
(for ( = 12). The most important point is that the power spec- 
trum evolves substantially throughout reionization. While xh 
declines (and hence so does the mean brightness temperature 
for 21 cm observations), the H II regions add power and more 
than compensate for the paucity of neutral gas. Thus we con- 
firm that statistical measurements can yield strong constraints 
on the ionization history. Surprisingly, the two curves show 
very little difference on scales k < 10 Mpc"'. The ampli- 
tudes, and especially the peak locations, closely match each 
other. On such scales the ionized regions dominate the power 
spectrum, so this indicates that the bubble size distribution 
for a fixed xh and m^in is nearly independent of redshift and 
(. In other words, the barrier B{m,z) is determined primar- 
ily by the neutral fraction. This points to an interesting result 
of our model: when scaled to the global neutral fraction, the 
morphology of reionization does not vary strongly. The in- 
variance does break down if ( is extremely large (greater than 
several hundred) or if Wmin is large. This is because the ioniza- 
tion field becomes more sensitive to the highly-biased peaks 
in such cases. It also breaks at small scales, where cx Pss, 
because the latter grows rapidly in this redshift range. Finally, 
we caution the reader that the behavior on large scales dur- 
ing the late stages of reionization is not well-described by our 
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Fig. 9. — The redshift evolution of /(xh^ss) in the C - 12 (thin lines) 
and = 40 (thick Hnes) models. The curves are: xh = 0.96 (dotted), xh = 0.8 
(short-dashed), xu = 0.5 (long-dashed), and Xf{ = 0.26 (solid). The redshifts 
in the two models differ 
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Fig. 10. — The redshift evolution of in the f = 500 (thin lines) and 
f = 90 (thick lines) models. In the latter case, we decrease Miniin by an order 
of magnitude from the default model. The curves are: xh = 0.96 (dotted), 
xh = 0.8 (short-dashed), xh — 0.5 (long-dashed), and xh = 0.24 (solid). The 
redshifts in the two models differ slightly. 



model (see ^4.2l l. Thus the trends we note may not hold in 
more detailed models near overlap. 

In Figure |8l we show the angular power spectrum of the 
brightness fluctuations, which measures the root mean square 
level of fluctuations as a function of multipole / (see ZFH04 
for details). To produce the figure we assumed that the obser- 
vations were done with perfect frequency resolution. In that 
case, the angular power spectrum traces A^{k) with the corre- 
spondence I ~ kD, where D is the angular diameter distance to 
the appropriate redshift. As a result, all the features in Figure 
0can be seen in Figure|8] For an experiment with finite reso- 
lution the power spectra will be somewhat different on small 
angular scales. An observed bandwidth An corresponds to a 
comoving distance 

,28, 

Vo.iMhzy V 10 y V0.15 J i' ' 

We denote by /ai/ the multipole corresponding to the angle 
subtended by L. The angular power spectrum will be given by 
A^(/ /D) for I < Iai> but will be proportional to A^,{1 /D) x 
(lAiy/l) for / > Iai^- Thus, on small angular scales the shape of 
the spectrum will change and the fluctuations will be reduced. 
For frequency resolutions around Ai^ ^0.2 MHz, realistic for 
upcoming experiments, this change in behavior will happen 
at arcminute scales, or equivalently Z ^ 10"^ (see ZFH04 for 
details). 

The behavior on small scales is better understood through 
Figure |9] which shows the ratio between ip in our model and 
that of a uniformly ionized IGM with the same xh. At large 
k, the curves each approach constant values nearly indepen- 
dent of (. That is, is nearly proportional to Pgs at large 
wavenumbers. In fact, it is easy to see from equation il l\ 
that P^/Pss ^ Xh- (There is actually an additional bias factor 
that affects the limiting value.) Figure|9lalso shows explicitly 
that the bubble peak moves to larger scales and higher am- 
plitude as reionization progresses. At a fixed neutral fraction. 

We have also neglected redshift space distortions; see ZFH04 for a dis- 
cussion of their importance. 



the peak amplitude relative to the underlying density field is 
largest at early times. This is partly because the large scale 
bubble characteristics are nearly independent of redshift while 
the density power spectrum grows as (1 +z)~^ and partly be- 
cause the bias increases with redshift. Thus (from a theoret- 
ical standpoint) distinguishing between a uniformly ionized 
medium and one with fully ionized regions will be somewhat 
easier if reionization occurs earlier 

Figure ^| compares two early reionization scenarios in 
which overlap is complete by z ^ 16. The thin lines assume 
( = 500 and the usual value of m^i„; the thick lines have ( = 90 
but decrease nij^m by a factor of 10. The redshifts in the Fig- 
ure are slightly different, ranging from z = 22.5-17.2 in the 
( = 500 model and z = 24-17 in the other As one might expect 
from the above discussion, there is relatively little difference 
between the two histories, at least in terms of the power spec- 
tra. Our model predicts that features occur on slightly larger 
angular scales for ( = 500 and have slightly less small-scale 
power The bubble feature also appears earlier in the C = 500 
model. This is because the bubbles are initially significantly 
larger in this case (both because the sources are stronger and 
because they are more rare). However, overall the differences 
are small enough that distinguishing them will probably re- 
quire a more accurate treatment of reionization through nu- 
merical simulations. When compared to Figure the ma- 
jor difference is the increased large scale power. This too is 
because the H II regions have a smaller number density and 
hence the shot noise term becomes significant earlier 

4.4. H II Regions Around Individual Galaxies 

The formalism described in general and does not rely 
on our model for the size distribution of H II regions. We can 
thus apply it to a scenario in which each galaxy hosts its own 
ionized region; the size distribution is then shown in the top 
panel of Figure|3 Such a model would be relevant if galaxies 
were truly randomly distributed. Figure ^2 shows the power 
spectra for several redshifts, again assuming ( = 40. These 
curves can therefore be compared directly to the thick lines 
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FlG. 1 1 . — The redshift evolution of if we assume that each galaxy 
hosts its individual H II region. All curves assume = 40. The lines are: xh = 
0.96 (dotted), xh = 0.8 (short-dashed), .% = 0.5 (long-dashed), and% = 0.26 
(solid). 



in Figure The difference between the models is substan- 
tial and it can also be seen in the angular power spectrum 
(see Figure First, in the present case the peak not only 
occurs at smaller scales, but its location does not shift signif- 
icantly as reionization is approached. This is because overlap 
is achieved not by increasing the sizes of individual galax- 
ies (though that does slowly happen) but instead by rapidly 
increasing their number density. Second, the large scale be- 
havior is significantly different: in the present case cx Pgs, 
and is always smaller than the power assuming a fully neutral 
medium. This is because the number of ionized regions is so 
large, making the shot noise negligible. The large scale power 
is minimized when xh ~ 0.5, because at that point the (anti- 
correlated) density and xh fields nearly cancel each other out. 
The long-dashed thin curve in Figure|8]shows the correspond- 
ing angular power spectrum for this model with xh = 0.5. We 
see that 6Ti, is nearly featureless on scales I < 10'*, with the 
bubbles appearing only at sub-arcminute scales. There is also 
a significant deficit of large-scale power. This comparison 
emphasizes the importance of considering large H II regions 
when describing reionization. 

5. THE PIXEL DISTRIBUTION FUNCTION & 
NON-GAUSSIAN SIGNATURES 

The power spectrum that we have computed in the last two 
sections is only one statistical characterization of the maps. 
Its widespread use in many studies, and in particular for the 
CMB and galaxy redshift surveys, stems from the fact that 
linear density perturbations are nearly gaussian. For such a 
field, the power spectrum completely describes its statistical 
properties. In our case, however, the ionization pattern is not 
gaussian, so other characterizations may be more powerful. 
Here, we estimate the expected probability distribution func- 
tion (PDF) of the dimensionless pixel neutral hydrogen den- 
sity ip with an eye toward developing such alternative diag- 
nostics. We caution the reader that this treatment will only 
be approximate; more sophisticated techniques, and probably 
numerical simulations, will be required for accurate descrip- 
tions. 



We begin by selecting our pixel size and the effective en- 
closed mass mpix (assuming the mean density for the volume- 
to-mass conversion). Conceptually, we divide the H II regions 
into those larger than mpix and those smaller than this mass 
scale. The PDF will have two components: a delta function 
a.tip = Q made up of the large bubbles and a broader distribu- 
tion at ip > Q composed of pixels with a mix of neutral gas 
and small bubbles. We can construct the second component 
as follows. 

Suppose we select a region with smoothed overdensity Sq. 
We first assume that the density field is gaussian; this is a rea- 
sonable approximation on most of the scales of interest (see 
below). Then we know /coii('5o) and hence also ipiSo) provided 
that we include only sources contained inside this pixel. If 
these were the only relevant ionizing sources, we could con- 
struct the PDF of tp in the usual way: pit/j) = p(S)\dS/dip\. 
However, we must also account for ionizing sources outside 
of our selected region; in other words, we need the probabil- 
ity that a pixel with a given 6q is contained within one of the 
large bubbles. In terms of the excursion set formalism, we 
wish to compute the probability that a particular trajectory 
that passes through our pixel scale at the appropriate den- 
sity has passed above our barrier B(a^) for some < a^^^; 
i.e., that it has been incorporated into a large-scale H II re- 
gion. We let p{Sl = Bcr I (^(Tpix = So) be the conditional proba- 
bility that a trajectory first crosses the barrier at given that 
it has (5(crpix) = Sq. The total probability that our pixel is fully 
ionized is then 



Pii6o)= / da^ pi5f=B,\S„ ,^ = So). 



(29) 

To compute the integrand, we use Bayes' Theorem: 

p(6i=B,\S,^,^ = So)p(S,^.,^ = So) = 

p(S.,^=So\Si=B^)p(6i = B,). (30) 

Here piSa^.^ = 5(i\5l = B \s the conditional probability that a 
trajectory has (5o on our pixel scale given that it first crosses 
the barrier at a^. This, as well as p{5), are gaussian, while 
p{5fj = Ba) is the first-crossing distribution described in ^ 
Thus 

p{5i = B,\5,^,^=5o)=-^ 



X exp 



.(31) 



If we instead used a constant barrier, note that p, could be 
written explicitly using mirror symmetry about the barrier, as 
in the standard derivation of the Press-Schechter mass func- 
tion (Bondet al. 1991). 
The PDF of i/; is then simply 



M^) = ^M) 



/=i 



d5_ 

dij) 



(32) 



The summation occurs because ipiS) is not monotonic. 
Clearly approaches zero as the density decreases. But it also 
approaches zero as (5 — > oo. At high densities, more ionizing 
sources reside in the pixel and x^ approaches zero. Thus, 
has a maximum value i}^^^ at any pixel size and redshift, and 
any given il> corresponds to two densities 5,. Also note that 
the PDF becomes singular (though still integrable) at i/imax- 
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Fig. 12. — The probability distribution p(tl>) for njpix = 10 Mq and = 
40. Dotted, short-dashed, long-dashed, and solid curves are for z = 2 0, 16, 14, 
and 13, respectively. The top panel shows the PDF of equation I32i . while the 
bottom panel approximately includes overlap with large bubbles (see text). 
The filled squares indicate the total probability of the pixel being fully ion- 
ized. In the top panel, the upper and lower squares are z = 13 and z = 14, 
respectively; in the lower panel, only z = 13 has a non-negligible probability. 



For analytic studies, it can thus be easier to consider the cu- 
mulative distribution function, P(> ip), which is always non- 
singular. 

Some examples are shown in Figure [21 We choose njpix = 
lO'"* Mq, corresponding to R k, 3.9 Mpc or an angular di- 
ameter « 2.7' at z = 13. The top panel shows p{tp) at 
z = 20, 16, 14, and 13 (dotted, short-dashed, long-dashed, and 
solid curves, respectively) in the C = 40 model. At z = 20, 
the PDF resembles a truncated gaussian. But even at z = 16, 
when f// = 0.88, ionizations have completely transformed the 
shape of the PDF. As described above, this is because the 
higher density regions have larger /coii and hence smaller neu- 
tral fractions. The high tail of the density distribution is thus 
folded over around i/jmax (which corresponds to (5 '--^ 0). More- 
over, when the density is close to the mean value, xh is nearly 
linear in the overdensity, so the peak can be quite sharp. As 
the bubbles grow, the peak shifts to smaller ijj and the tail 
grows rapidly. This occurs because high-density pixels have 
a higher probability of being mostly ionized by bubbles com- 
parable to their own size. Once the characteristic bubble size 
passes OTpix (which happens just before z = 13 for these param- 
eters, see Figure 13, the probability of = rises rapidly as 
well. 

Another way to look at this is through P(> t/j), which we 
show in Figure^] The solid curves are for the above model, 
while the dotted curves show the same function for a uni- 
formly ionized universe. (In the latter case the PDF is just a 
gaussian.) The curves are for z = 20, 16, 14, and 13, from top 
to bottom. We see clearly that even when the ionized fraction 
is only a few percent, the high-density tail is truncated and 
folded over to add to the small-?/' component. It is obvious 
from this figure that the variance is smaller in the /cou model 
during the early phases of reionization; this is again because 
high density regions and ionized bubbles are strongly corre- 
lated. However, as the bubbles grow P{ip) rapidly spreads 
toward ip = 0; this corresponds to fluctuations in the ionized 




Fig. 13. — The cumulative distribution function of i/i, assuming njpix = 
lO'^^ Mq and f = 40. The solid curves assume our model, while the dotted 
curves assume a uniformly ionized medium. In each case, the curves are for 
z = 20, 16, 14, and 13, from top to bottom. 



fraction dominating the power spectrum. This figure also ex- 
plicitly shows that the probability of having a fully ionized 
region is negligible until z ^ 13. 

This version of the PDF is not necessarily the natural choice 
for comparing to observations, because we have assigned all 
of the large bubbles to pixels with zero flux. In a real survey, 
pixels could intersect a fraction of a bubble. In the limits in 
which the characteristic bubble size is either much smaller or 
much larger than the pixel size, this complication can be ne- 
glected. We can estimate the PDF in the intermediate range if 
we assume (i) that the fully ionized part of a pixel is uncor- 
rected with the mean density of the rest of the pixel, and (ii) 
that the part outside of a large bubble follows the distribution 
of equation ( I32t : we thus neglect the fact that this component 
is smaller than the full pixel size. In this case, the only addi- 
tional component we need is the distribution of the fraction /, 
of the pixel that is inside of large ionized bubbles. This can 
be written 



>/o)= 1-exp 



dn 

dm—V^^^(m,fo) 
am 



(33) 



where ymax('M,/o) is the volume within which a bubble of the 
specified mass will cover a fraction fo or larger of the pixel; it 
is a straightforward generalization of equation ( I18t . 

In this case we simulate p{tp) through a Monte Carlo algo- 
rithm. We randomly select both /, and ip„ using their respec- 
tive cumulative distribution functions. Here, i/;„ refers to the 
part of the pixel outside of the large bubble, so "0 = fjipn- Some 
examples of p(tp) are shown in the bottom panel of FigurefT^ 
The effects of overlap with fully ionized regions are obviously 
negligible when the pixel size is significantly larger than the 
typical bubble size. When the two scales are comparable at 
z ~ 13, overlap spreads the delta function at i/; = out, but it 
does not qualitatively affect the results. If we had included 
correlations, the effect of overlap would be slightly smaller 
than shown here because pixels near ^,j,ax are less likely to be 
near ionized regions than pixels that are already mostly ion- 
ized. 



We have assumed throughout this section that the density 
field has a gaussian distribution. This is obviously only rel- 
evant on scales where the power spectrum is approximately 
linear Our approximation breaks down when higher mo- 
ments of the density field become significant. As an ex- 
ample we consider the third moment of piS). The skew- 
ness due to pure gravitational instabili ty is (6^) = S^ jS'^)^, 
where ^3 = [34/7-(3 + w)1 if P(k) oc r" (iBouchet et alJl992l: 
lluszkiewic z et alj|l993l) . The skewness only becomes signif- 
icant at m < 10'^ Mq for z > 10, so our choice of m^i^ is 
reasonable. 

We thus see that the boosts in power from the H II re- 
gions are actually a result of a qualitative change in p(ip) and 
not simply an increase in the variance of a gaussian. This 
shows explicitly that exploring statistics beyond the power 
spectrum is crucially important to understanding the 21 cm 
signal. From the estimates we have made here, the optimal 
tests are not immediately obvious; we will defer closer ex- 
amination to the future. Of course, our model is only ap- 
proximate, as we treat the perturbations in Lagrangian rather 
than real space and neglect large-scale correlations and re- 
combinations. All of these complications will be important 
in determining the true shape of the PDF and to what extent 
observations can distinguish different models, but none will 
change our general conclusion about the importance of non- 
gaussianity. For example, the shape of the PDF is determined 
entirely by our model for reionization; in other models with 
comparable power the PDF could still have a qualitativ ely dif- 
ferent sh ape. We explore one such example in Furlanet to et alJ 

6. DISCUSSION 

In this paper, we have described a new analytic model for 
the size distribution of H II regions during reionization. While 
the overlap process is usually described in terms of Stromgren 
spheres around individual galaxies, recent cosmological sim- 
ulations of reionization have demonstrated that the ionized re- 
gions are much larger than naively expected, even early in the 
reionization process. Here we have shown that the size dis- 
tribution can be understood in terms of large-scale features in 
the density field, and we have constructed the size distribution 
with an approach analogous to the standard derivation of the 
iPress & Schechter (.19_74) mass function for collapsed halos. 
The model has only two input parameters (if the cosmology 
is fixed): the ionizing efficiency of collapsed objects and the 
minimum mass halo that can host a luminous object. Interest- 
ingly, at a fixed neutral fraction the characteristic size of the 
bubbles is fairly insensitive to these input parameters, sug- 
gesting that the morphology of reionization is close to invari- 
ant (at least for simple, single reionization episodes). While 
we make several simplifying approximations in the model 
(see the discussion in it provides a self-consistent ap- 
proach that reproduces the qualitative features of simulations. 
In the future, the model must be quantitatively compared to 
simulations; however, an accurate comparison requires simu- 
lations with large 100^ Mpc-') volumes because of the ef- 
fects of the large-scale density field (see Figure |4] as well as 
lBarkana&Loeb.2003l) . 

The morphology of reionization has implications for a 
number of observables. Most important, regardless of the 
technique, we predict large H II regions that should be fea- 
sibl e to detect, whe ther through quasar absorption spec- 
tra (iMiralda-Escude et al.,2000.: Bai'kana & Loeb 2002), Lyg 
lines at extremely high-redshifts (iPello et alJ(2004ilLoeb et al.l 
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j200l iRic ottiet al ll2()0i iGnedin & Pradal l2()04l iCen et alJ 

2004; Bart on et al.ll2004j) . or 21 cm tomography. For exam- 
ple, the noise in a 21 cm map is proportional to the square 
root of the bandwidth and, more important, to the square of 
the pixel size. Our model predicts substantial contrast on 
relatively large scales of several arcminutes, which should 
make detections easier Note, however, that in many next- 
generation experiments like LOFAR the "bandwidth" can be 
chosen after the observations are complete, so the expected 
scale of the bubbles need not determine the experimental de- 
sign (Morales & Hewitt 2003). 

Even if high signal-to-noise detections of individual H II 
regions are not available, we have shown that statistical mea- 
surements of the size distribution can still strongly constrain 
reionization. We first constructed the power spectrum of fluc- 
tuations in the neutral density, including both density fluc- 
tuations and the ionized regions. We found that the ionized 
regions imprint clear features on the power spectrum and am- 
plify the power by a factor of several during the middle and 
late stages of reionization. Most important, the power spec- 
trum evolves throughout reionization, allowing us to map the 
time history of reionization. For 21 cm observations, the 
large-scale H II regions predicted by our model put the fea- 
tures at I < 10"* (or 9 > 2'). This matches well to the scales 
able to be probed by upcoming experiments like PAST, LO- 
FAR, and SKA (ZFH04). If, on the other hand, reionization 
occurred through the overlap of H II regions around individual 
galaxies, the features in the power spectrum would appear at 
much smaller scales, perhaps beyond the reach of these in- 
struments. We have also shown explicitly that the ionized 
bubbles induce qualitative changes to the initially gaussian 
neutral density distribution. This suggests that statistical mea- 
surements beyond the power spectrum can offer probes of the 
physics of reionization that may be even more powerful. 

Our results also have important implications for other mea- 
surements. For example, one of the most successful strategies 
for targeting high-redshift galaxies is by searching for strong 
Lya emitters. If the galaxy is embedded in a mostly neu- 
tral medium, Lya absorption fr om the IGM can have a sub- 
stantial effect on the line profile JHaimari2002t ISantos''2003t 
Loeb et al. 2004). In our model, we expect this absorption 
to be less significant, because most galaxies are embedded in 
H II regions with large sizes, even relatively early in the reion- 
ization process. 

Finally, in this paper we have considered only the simplest 
reionization histories with a single type of source. Many mod- 
els for reconciling the quasar and CMB data on reionization 
require multiple, distinct generations of sources that cause 
"stalli ng" o r even "double" reionization ( Wvithe & Loel 
I2OOI |ci3 l2(OT3t iHaiman & l2?M iSokasian et al 

2003bi Such histories will of course change the morphology 
of reionization and hence modify the 21 cm signal. More- 
over, alternative models of reionization in which (for exam- 
pl e) voids are ionized first yield different sets of signatures. 
In lFurlanetto et alJ J2004bl) . we use the formalism developed 
here to examine how well 21 cm tomography can distinguish 
these histories and models. 
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